An Intrinsic Information Content Metric for Semantic Similarity in WordNet
نویسندگان
چکیده
Information Content (IC) is an important dimension of word knowledge when assessing the similarity of two terms or word senses. The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus. In this paper we present a wholly intrinsic measure of IC that relies on hierarchical structure alone. We report that this measure is consequently easier to calculate, yet when used as the basis of a similarity mechanism it yields judgments that correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis.
منابع مشابه
A semantic similarity metric combining features and intrinsic information content
In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper a new semantic similarity metric, that exploits some notions of the feature based theory of similarity and translates it into the information theoretic domain, which leverages the notion of Information Content (I...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملA Novel Information Theoretic Framework for Finding Semantic Similarity in WordNet
Information content (IC) based measures for finding semantic similarity is gaining preferences day by day. Semantics of concepts can be highly characterized by information theory. The conventional way for calculating IC is based on the probability of appearance of concepts in corpora. Due to data sparseness and corpora dependency issues of those conventional approaches, a new corpora independen...
متن کاملMetric of intrinsic information content for measuring semantic similarity in an ontology
Measuring information content (IC) from the intrinsic information of an ontology is an important however a formidable task. IC is useful for further measurement of the semantic similarity. Although the state-of-art metrics measure IC, they deal with external knowledge base or intrinsic hyponymy relations only. A current complex form of ontology conceptualizes a class (also often called as a con...
متن کاملSemantic Similarity Measure Using Information Content Approach With Depth For Similarity Calculation
Similarity is criteria of measuring nearness or proximity between two concepts. Several algorithmic approaches for computing similarity have been proposed. Among the existing Similarity measure, majority of them utilize WordNet as an underlying ontology for calculating semantic similarity. WordNet is a lexical database for English Language which was created and maintained by Congnitive Science ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004